This form is a web page which was created in MS WORD and therefore can be easily edited that way

“VIS(US) Stuttgart – Fuzzy Rule Hypothesis Graph”

VAST 2009 Challenge
Challenge 2 - Social Network and Geospatial

Authors and Affiliations:

Michael Wörner, GSaME – Universität Stuttgart, Michael.Woerner@gsame.uni-stuttgart.de
Harald Bosch, VIS – Universität Stuttgart
Steffen Koch,VIS – Universität Stuttgart

Tool(s):

We customized tools based on previous developments of our department, and adapted them to fit the requirements of the challenge. We integrated them into one application along with newly built tools.

These tools comprise: a fuzzy logic rule evaluator to analytically determine which entities cannot be of a certain role type, a table view of the current candidate entities for a given role, a hypothesis graph view, a graph view to display the resulting networks, and a map display to investigate the (inter)national connections of candidate syndicates.

For the development, we mainly used the Java SDK, Apache libraries, and the prefuse visualization toolkit. Additionally, Microsoft Excel was used for some tasks.

Video:

Video.avi (DivX encoded)

ANSWERS:

MC2.1: Which of the two social structures, A or B, most closely match the scenario you have identified in the data?

MC2.2: Provide the social network structure you have identified as a tab delimitated file. It should contain the employee, one or more handler, any middle folks, and the localized leader with their international contacts. What are the Flitter names of the persons involved? Please identify only key connections (not all single links for example) as well as any other nodes related to the scenario (if any) you may have discovered that were not described in the two scenarios A and B above.

Flitter.txt

MC2.3: Characterize the difference between your social network and the closest social structure you selected (A or B). If you include extra nodes please explain how they fit in to your scenario or analysis.

In order to find the structures outlined by the scenario descriptions in the provided data set, we created a tool, partly reusing technology created in previous projects. It restricts sets of entities (Flitter contacts in this case) based on the number of contacts they have, optionally taking into account the role of those contacts and supporting fuzzy rule definitions ("about 40 contacts"). Over the course of a few days, we adapted our tools for the challenge and complemented them with new ones. We defined appropriate rules formalizing the scenario descriptions. These rules include “an employee knows roughly 40 contacts” or “a handler knows at least 1 middle man that at least 2 other handlers know”, for example. The creation of these rules took only a few minutes.

Figure: the available rules

We imported the provided data set into our tool and started with the initial hypothesis that every person is a candidate for every role. This results in 6000 candidates for each of the four primary roles ‘employee’, ‘handler’, ‘middle man’, and ‘leader’. By attaching rules to our starting hypothesis node through drag&drop interaction, we created derived hypotheses, thereby continuously reducing the sets of candidates by eliminating those that do not meet the rules for the given scenario. The subsequently applied rules comprise “an employee knows roughly 40 contacts”, “a handler knows at least 1 employee”, “a handler knows roughly 30-40 contacts”, “a middle man knows at least 3 handlers”, “a leader knows at least 1 middle man”, “a leader knows at least about 125 contacts”, and ”an employee knows at least 3 handlers”. All of these rules were derived directly from the description of scenario A. By requiring candidates to know “at least” as many contacts as specified by the scenario, we were able to exclude those who know less and therefore most certainly do not fulfill the role requirements. For example, Flitter users with 20, 50, or even 80 contacts can be removed from the set of possible ‘leaders’, as they do not meet the "well over 100 contacts” rule. Many of the rules contain only approximate values, so we assigned confidence values to the candidates, based on how well they meet a requirement. Because we were looking for a structure that not necessarily satisfies all of the scenario rules perfectly, we give the analyst the option to assign a weight value to rules, determining the impact of a single rule on the confidence calculation. The result is shown in the figure below.

Figure: The application of “safe rules” reduces the possible roles for many entities. Starting from 6000 for each role we get 116 / 298 / 775 / 22 candidates for employees (red) / handler (green) / middlemen (cyan) / leader (magenta). The rule set on the right shows available (brighter) and already used (darker) rules. Dropping a rule next to a node creates a circular slider to adjust the weight of the rule and a node representing the resulting candidate set.

Continuing from this point, we considered that a ‘middleman’ knows three ‘handlers’, the ‘leader’, at most one other member of the criminal organization, and no one else. As confirmed by a blog answer, "no one else" refers to the entire Flitter network, so we added the rule “a middleman knows roughly 4-5 contacts”. Noticing that the substantial reduction of ‘middlemen’ did not affect the number of ‘handler’ candidates, we added the rule “a handler knows at least 1 middle man”, which left us with a set small enough to be visualized as a graph.

Figure: Possible criminal networks can be displayed by hovering over an entity. This network is incomplete because the ‘middleman’ (cyan) is not in contact with any ‘leader’ candidate.

The entities are laid out according to their roles. Entities which are candidates for more than one role are represented by multiple visual items. Pointing at an entity highlights its links to the adjacent layers and uncovers that there are ‘middlemen’ without any connection to a ‘leader’ candidate. Adding this last rule ("a middle man knows at least one leader") results in the two possible networks shown below.

Figure: Two possible criminal networks. The highlighted one does not fully comply with the description because two of the handlers know each other.

In one of these networks, two of the ‘handlers’ (@bailey and @letelier) know each other, which contradicts the scenario description. Right clicking @bailey and explicitly stating that he or she "is not a handler" removes the respective network from the hypothesis (by means of rules that are still in effect but can no longer be met if @bailey is no ‘handler’). This leaves only one network with @shaffter as the ‘employee’ and @szemeredi as the ‘leader’. This result fulfills every aspect of the description of scenario A perfectly and is our primary solution.

The international contacts of the persons involved in the network can easily be investigated by brushing over the candidate network view. All highlights are directly reflected in the linked map display.

After identifying the network, we checked the contacts table of the ‘middleman’ @good, who is said to have only members of the criminal organization on his Flitter contact list. This reveals @moilanen as a possible additional member of the organization.

We then considered the slightly different description of scenario B, which states that “each of the middle men probably communicates with one or two others in the organization, and no one else”. In this context, this translates to “a middleman has 2-3 contacts” (one handler plus 1-2 others). However, as the minimum number of contacts for any user is 4, this rule would eliminate all ‘middleman’ candidates. Applying other “safe” rules reduces the candidate set to 3 names that have only 4 contacts, but assuming any of them as a ‘middleman’ does not result in the expected network structure. Strictly applying the other criteria excludes all potential result networks that have the correct number of contacts for ‘employees’ and ‘handlers’.

MC2.4: How is your hypothesis about the social structure in Part 1 supported by the city locations of Flovania? What part(s), if any, did the role of geographical information play in the social network of part one?

The ‘employee’ and the three ‘handlers’ identified by the analysis in 2.3 are from the same city, Prounov. The ‘middleman’ resides a certain distance away, in Kannvic, Flovania, north west of Prounov. ‘Fearless Leader’ himself is located further away north, in Kouvnic. This supports the social structure from Part 1: The ‘handlers’ stay in close contact with the employee. The ‘middleman’ keeps a distance and reports to the leader who resides even further away. We created a rule to check this fact (“an employee knows at least 3 handlers from the same city”) and it eliminates many candidate ‘employees’ that already had reduced confidence values due to violating other fuzzy constraints (“roughly 40 contacts”).

Figure: Applying the above mentioned rule at this stage reduces the possible employees from 144 to 21 despite the low confidence penalty of 0.47.

Again, this structure can be easily validated using the interactive map view.

MC2.5: In general, how are the Flitter users dispersed throughout the cities of this challenge? Which of the surrounding countries may have ties to this criminal operation? Why might some be of more significant concern than others?

The dispersion of Flitter users can easily be calculated by first exporting the contact tables containing the location and role confidence of each contact from our tool to MS Excel. Afterwards a Pivot Table can be used to aggregate the values. As we can display a list of the contacts for every set of entities we can reuse the Pivot Table to get the same information for only the contacts of the criminal organization by exporting the contact lists to Excel. We can see that especially the leader has more contacts to Posana than the average.

Overall dispersion:

City	Flitter users	%
Koul	1998	33,30
Prounov	1707	28,45
Kouvnic	798	13,30
Kannvic	320	5,33
Solvenz	210	3,50
Pasko	147	2,45
Otello	147	2,45
Sresk	147	2,45
Ryzkland	142	2,37
Solank	135	2,25
Transpasko	126	2,10
Tulamuk	123	2,05

Origins of the contacts of the criminal network:

Country	Contacts	%	% of all users
Posana	9	2,62%	2,45%
Flovania	321	93,59%	93,40%
Transak	6	1,75%	2,10%
Trium	7	2,04%	2,05%

“VIS(US) Stuttgart – Fuzzy Rule Hypothesis Graph”

VAST 2009 Challenge Challenge 2 - Social Network and Geospatial

Authors and Affiliations:

Tool(s):

VAST 2009 Challenge
Challenge 2 - Social Network and Geospatial